data analysis
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > Austria > Vienna (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (11 more...)
- Health & Medicine (1.00)
- Banking & Finance (1.00)
- Law (0.68)
- Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)
- North America > United States > Indiana > Monroe County > Bloomington (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Texas > Travis County > Austin (0.14)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > France (0.04)
MultiparameterPersistenceImagesforTopological MachineLearning
However,in manyapplications there are several different parameters one might wish to vary: for example, scale and density. In contrast to the one-parameter setting, techniques for applying statistics and machine learning in the setting of multiparameter persistence are not well understood due to the lack of a concise representationoftheresults.
- North America > United States (0.14)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > France (0.04)
ScatteringGCN: OvercomingOversmoothnessin GraphConvolutionalNetworks-Supplement
Now,since|N(v)|=β,itholds (Px)[v]= a+b 2, thus verifying the first claim of the lemma as the choice ofv was arbitrary. This construction essentially generalizes the graph demonstrated in Figure 1 of the main paper (see Sec. 7). The following lemma shows that onsuch graphs, the filter responses ofgθ for aconstant signal will encode some geometric information, butwill not distinguish between the cycles inthe graph. These responses with appropriate color coding give the illustration in Figure 1 in the main paper. Validation & testing procedure: All tests were done using train-validation-test splits of the datasets, where validation accuracy is used for tuning hyperparameters and test accuracy is reportedinthecomparisontable.
- South America > Colombia > Meta Department > Villavicencio (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (7 more...)
- North America > United States > Arizona (0.04)
- North America > Canada (0.04)
- Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
Hierarchical topological clustering
Topological methods have the potential of exploring data clouds without making assumptions on their the structure. Here we propose a hierarchical topological clustering algorithm that can be implemented with any distance choice. The persistence of outliers and clusters of arbitrary shape is inferred from the resulting hierarchy. We demonstrate the potential of the algorithm on selected datasets in which outliers play relevant roles, consisting of images, medical and economic data. These methods can provide meaningful clusters in situations in which other techniques fail to do so.
NanoBaseLib: A Multi-Task Benchmark Dataset for Nanopore Sequencing
Nanopore sequencing is the third-generation sequencing technology with capabilities of generating long-read sequences and directly measuring modifications on DNA/RNA molecules, which makes it ideal for biological applications such as human Telomere-to-Telomere (T2T) genome assembly, Ebola virus surveillance and COVID-19 mRNA vaccine development. However, accuracies of computational methods in various tasks of Nanopore sequencing data analysis are far from satisfactory. For instance, the base calling accuracy of Nanopore RNA sequencing is $\sim$90\%, while the aim is $\sim$99.9\%. This highlights an urgent need of contributions from the machine learning community. A bottleneck that prevents machine learning researchers from entering this field is the lack of a large integrated benchmark dataset.
On Tractable Computation of Expected Predictions
Computing expected predictions of discriminative models is a fundamental task in machine learning that appears in many interesting applications such as fairness, handling missing values, and data analysis. Unfortunately, computing expectations of a discriminative model with respect to a probability distribution defined by an arbitrary generative model has been proven to be hard in general. In fact, the task is intractable even for simple models such as logistic regression and a naive Bayes distribution. In this paper, we identify a pair of generative and discriminative models that enables tractable computation of expectations, as well as moments of any order, of the latter with respect to the former in case of regression. Specifically, we consider expressive probabilistic circuits with certain structural constraints that support tractable probabilistic inference. Moreover, we exploit the tractable computation of high-order moments to derive an algorithm to approximate the expectations for classification scenarios in which exact computations are intractable. Our framework to compute expected predictions allows for handling of missing data during prediction time in a principled and accurate way and enables reasoning about the behavior of discriminative models. We empirically show our algorithm to consistently outperform standard imputation techniques on a variety of datasets. Finally, we illustrate how our framework can be used for exploratory data analysis.